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We  undertook  a  broad  program  for  research  into  image  understanding  techniques 
suited  for  a  variety  of  applications.  We  divided  our  tasks  into  three  major 
categories.  However,  we  wish  to  emphasize  that  the  different  tasks  are  highly 
interelated  and  share  many  common  techniques.  The  major  task  areas  over  the 
course  of  this  contract  were  three-dimensional  vision  including  descriptions 
from  range  data,  shape  inference  from  images,  and  object  recognition;  motion 
analysis  and  parallel  processing.  This  report  discusses  the  status  of  the 
various  individual  research  projects  funded  by  this  contract. 
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1  INTRODUCTION 

This  report  summarizes  research  progress  over  the  term  of  the  contract.  We  will 
give  a  brief  overview  of  the  work  in  this  introduction  with  more  detail  provided  in  lat¬ 
er  chapters.  Longer  discussions  of  many  of  these  projects  has  appeared  in  the  Image 
Understanding  Workshop  Proceedings  for  January  1992  and  April  1993.  We  also  pro¬ 
vide  references  to  these  and  other  published  references  to  this  work.  Additionally, 
many  of  the  papers  have  appeared  in  the  Annual  Technical  reports  for  1991, 1992  or 
1993. 

We  undertook  a  broad  program  for  research  into  image  understanding  tech¬ 
niques  suited  for  a  variety  of  applications.  We  divided  our  tasks  into  three  major  cat¬ 
egories.  However,  we  wish  to  emphasize  that  the  different  tasks  are  highly  inter¬ 
related  and  share  many  common  techniques.  The  major  task  areas  over  the  course  of 
this  contract  were  three-dimensional  vision  including  descriptions  from  range  data, 
shape  inference  from  images,  and  object  recognition;  motion  analysis  and  parallel  pro¬ 
cessing.  The  introduction  will  briefly  outline  the  different  task  areas  from  the  propos¬ 
al  with  discussion  of  the  status  of  work  on  that  topic.  These  tasks  are  discussed  in 
more  detail  in  the  following  sections. 

1.1  Three  Dimensional  V  ision 

The  ability  to  describe  and  recognize  3-D  objects  is  needed  for  many  tasks  includ¬ 
ing  those  of  manufacturing  robotics  as  well  as  for  outdoor  object  recognition.  For  the 
project  our  goals  included: 

•  Develop  formal  methods  to  represent  3-D  objects,  in  their  entirety  or  their  vis¬ 
ible  faces  only,  in  terms  of  volumetric  or  surface  descriptors  for  polyhedra  and 
free  form  sculptured  objects. 

•  Develop  techniques  to  robustly  compute  these  descriptors  in  real  images  in  the 
presence  of  noise,  shadows,  surface  markings  and  occlusion. 

•  Develop  techniques  that  process  input  from  different  sources  including  range, 
stereo  and  monocular  intensity  images. 

•  Develop  techniques  that  use  the  description  of  3-D  objects  for  robust  recogni¬ 
tion  and  pose  identification,  taking  advantage  of  the  richness  of  our  high-level 
symbolic  descriptions  to  perform  indexing  into  the  database  of  stored  objects. 

•  Develop  a  system  to  automatically  acquire  the  descriptions  of  a  single  3-D  ob¬ 
ject  model  from  a  series  of  views. 


Final  Report 


1 


These  goals  were  addressed  in  several  research  projects: 

•  Developed  a  system  to  generate  three-dimensional  descriptions  from  range 
data  using  deformable  models  to  describe  arbitrary  shapes,  see  Section  2.1.2  and 
[28,32]. 

•  Developed  programs  to  generate  3-D  descriptions  from  single  gray  scale  images 
based  on  analysis  of  contours,  see  Section  2.4  and  [66,  67].  This  work  involves 
theoretical  analysis  of  the  appearance  of  contours  and  implementations  of  these 
theories. 

•  Developed  a  system  for  generating  segmented  3-D  descriptions  from  gray  scale 
images  using  symmetries,  see  Section  2.1.3  and  [54] 

•  Developed  3-D  descriptions  from  stereo  views,  especially  applied  to  buildings 
[7,8]. 

•  Developed  general  perceptual  grouping  techniques  that  have  been  as  an  impor¬ 
tant  part  of  this  work  [15] 

•  Used  three-dimensional  and  two-dimensional  descriptions  of  objects  for  recog¬ 
nition  is  several  task  domains,  see  Section  2.2  and  [56] 

•  Developed  two  complementary  systems  that  combine  range  images  from  sever¬ 
al  different  directions  to  produce  a  complete  model  of  the  object.  The  first  com¬ 
bines  the  range  images  to  generate  a  complete  description  [1,3].  The  second 
approach  generates  descriptions  for  each  view  and  combines  these  into  a  bound¬ 
ary  based  description  of  the  objects  [38].  See  Section  2.1.1  for  more  information. 

1.2  Motion  Analysis 

This  includes  detection  and  description  of  moving  objects  and  inference  of  the 
three-dimensional  structure  of  the  environment.  We  propose  to  develop  the  funda¬ 
mental  parts  needed  in  a  general  motion  understanding  system  and  to  demonstrate 
their  viability  on  some  specific  scenarios.  Our  goals  in  the  analysis  of  sequences  of  im¬ 
ages  included: 

•  Develop  techniques  for  estimating  2-D  (image  plane)  motion  and  object  seg¬ 
mentation  in  the  presence  of  occlusion  in  closely  spaced  image  sequences  (i.e. 
spatio-temporal  data)  and  for  inferring  the  object  structure  from  the  data. 

•  Develop  techniques  for  matching  contours  and  other  features  that  apply  to 
broad  classes  of  scenes  and  motions.  This  will  include  the  use  of  feedback  from 
higher  levels  of  processing  such  as  motion  estimation  and  3-D  inference. 

•  Develop  a  system  for  effective  3-D  inference  from  motion  sequences  combining 
the  computation  of  depth  and  structure  from  motion  with  other  general  tech¬ 
niques  and  for  merging  local  3-D  scene  descriptions  into  global  descriptions. 

•  Develop  an  integrated  motion  analysis  system  and  test  on  increasingly  difficult 
scenarios.  The  system  will  use  existing  analysis  modules,  but  will  include  inter¬ 
action  between  systems  and  feedback  of  results  to  the  feature  extraction  process. 
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Through  several  projects  we  addresses  these  goals: 

•  Developed  a  slice-based  analysis  technique  that  computes  dense  optical  flow 
estimates  for  arbitrary  observer  motion  in  closely  spaced  data,  see  Section  3.1 
and  [43]. 

•  Developed  a  program  for  contour  matching  in  image  sequences  [13]  and  a  sec¬ 
ond  system  for  matching  both  region  and  comer  features  within  a  larger  motion 
analysis  system,  see  Section  3.3  and  [24]. 

•  Developed  a  three-dimensional  motion  estimation  system  that  uses  an  arbi¬ 
trary  sequence  (at  least  three  frames)  of  matched  points.  This  system  allows  for 
constant  motion  and  a  limited  case  of  accelerations  along  with  handling  occlu¬ 
sions  and  missing  matches.  See  Section  3.2  and  [11]. 

•  Developed  an  integrated  system  that  combined  automatic  feature  extraction 
and  matching  with  the  3-D  motion  estimation  system  to  produce  3-D  descrip¬ 
tions  of  detected  objects.  The  combination  included  feedback  of  motion  and 
structure  estimates  to  the  matching  to  eliminate  poor  matches  and  find  missing 
features.  This  work  is  discussed  more  fully  in  Section  3.3  and  [26]. 

•  Developed  a  trinocular  (three  camera)  vision  system  for  use  on  a  mobile  plat¬ 
form  vision  for  guidance  and  obstacle  avoidance.  The  three  cameras  simplify  the 
stereo  computations  and  produce  rapid  reliable  estimates  of  structures  in  indoor 
environments,  see  Section  3.4  and  [23]. 

1.3  Parallel  Processing 

This  includes  work  in  mapping  existing  algorithms  to  available  parallel  imple¬ 
mentations,  and  the  study  of  how  parallel  architectures  can  be  designed  to  aid  in  the 
development  of  parallel  algorithms.  The  development  of  parallel  techniques  will  sup¬ 
port  efficient  implementations  of  systems  developed  in  the  other  tasks.  Our  initial 
goals  were: 

•  Develop  a  mapping  for  image  analysis  algorithms  onto  massively  parallel  ar¬ 
chitectures. 

•  Develop  techniques  for  describing  the  communication  necessary  in  parallel  im¬ 
plementations  of  symbolic  image  analysis  algorithms. 

We  addressed  these  in  two  projects  in  mapping  computer  vision  algorithms  to  parallel 
architectures: 

•  We  analyzed  the  communication  requirements  for  implementing  computer  vi¬ 
sion  algorithms  in  a  variety  of  existing  parallel  architectures.  This  is  discussed 
in  Section  4.1  and  [22,  50]. 

•  Developed  a  general  analysis  technique  to  study  different  massively  parallel 
architectures  and  their  application  to  a  variety  of  computer  vision  algorithms. 
This  is  described  in  Section  4.5  and  [49,51], 
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2  THREE-DIMENSIONAL  VISION 

2.1  Description  of  3-D  Objects 

2.L1  Integration  from  Multiple  V  iews 

We  have  developed  systems  for  building  models  from  unregistered  multiple 
range  images  [39,  2].  The  latter  system  integrates  views  at  the  triangulated  surface 
level  rather  than  at  the  pixel  level.  A  triangulated  surface  model  can  represent  a  va¬ 
riety  of  solid  objects,  and  theoretically  to  any  kind  of  resolution.  They  are  not  ideal 
representations  for  high  level  vision  tasks,  such  as  recognition,  because,  first,  the  rep¬ 
resentation  is  still  low  level,  second,  it  is  sensitive  to  many  parameters,  and  therefore 
unstable.  However,  we  think  it  is  a  good  intermediate  representation  for  integration 
and  for  building  high  level  description  through  surface  interpolation  from  triangula¬ 
tion. 

2.1.2  Deformable  Models 

A  second  project  in  range  analysis  involves  the  use  of  deformable  surfaces  to  gen¬ 
erate  a  3-D  approximation  of  range  data.  This  work,  performed  by  C.  Liao  and  G.  Me- 
dioni,  builds  on  our  earlier  work  in  “B-Snakes  ”  The  user  provides  an  initial  simple 
surface,  such  as  a  cube,  which  is  subject  to  internal  forces  (describing  implicit  conti¬ 
nuity  properties  such  as  tension  and  bending)  and  external  forces  which  attract  it  to¬ 
ward  the  data  points.  The  problem  is  cast  in  terms  of  energy  minimization.  We  solve 
this  non-convex  optimization  problem  by  using  the  well  known  Powell  algorithm 
which  guarantees  convergence  to  a  (possibly  local)  extremum  and  does  not  require 
gradient  information.  The  variables  are  the  positions  of  the  control  points.  The  num¬ 
ber  of  control  points  is  adaptively  controlled.  This  methodology,  leads  to  a  reasonable 
complexity  and  good  numerical  stability.  We  also  provide  a  novel  solution  to  the  prob¬ 
lem  of  subdividing  a  patch  when  the  fit  is  bad.  We  show  results  on  real  range  images 
to  illustrate  the  applicability  of  our  approach.  The  advantages  of  this  approach  are 
that  it  provides  a  compact  representation  of  the  approximated  data,  and  lends  itself 
to  applications  such  as  non-rigid  motion  tracking  and  object  recognition.  Currently, 
our  algorithm  gives  only  a  C°  continuous  analytical  description  of  the  data,  but  due 
to  the  flexibility  of  our  adaptive  approach  it  should  be  upgraded  to  C1  or  C2  easily. 
This  work  is  discussed  in  more  detail  in  [28]. 

2.1.3  Segmented  Volumetric  3-D  Descriptions 

We  address  the  problem  of  recovering  segmented  hierarchical  volumetric  de¬ 
scriptions  of  three  dimensional  shapes.  In  an  earlier  work  [52,54],  we  have  suggested 
a  method  (using  SLS)  for  obtaining  hierarchical  axial  descriptions  of  planar  shapes, 
together  with  a  decomposition  of  the  shapes  into  their  parts.  Unfortunately,  it  is  not 
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straightforward  to  extend  these  methods  to  handle  three  dimensional  shapes.  This  is 
because  in  the  three  dimensional  space  the  SAT  and  SLS  axes  are,  in  general,  not 
curves,  but  surfaces,  leading  to  unnatural  descriptions  [79]. 

In  this  current  work,  performed  by  H.  Rom  and  G.  Medioni,  we  restrict  our¬ 
selves  to  three  types  of  parts:  Convex  blobs  (or  Ovoids,  borrowing  the  terminology 
from  Koenderink  [75]),  Straight  Homogeneous  Generalized  Cylinders  (SHGCs  [82]), 
and  Planar  light  Constant  GCs  (PRCGCs  [63],  planar  axis  and  constant  cross  sec¬ 
tion).  These  components  exhaust  many  of  the  man-made  objects  encountered  on  a 
normal  basis.  We  suggest  the  use  of  properties  of  the  parabolic  curves  (zero  crossings 
of  the  Gaussian  curvature)  for  recovering  the  cross  sections  and  axes  of  the  different 
parts.  We  advocate  the  use  of  the  parabolic  curves  over  the  often  used  occluding  con¬ 
tours,  which  are  unstable  in  range  data.  We  will  assume  that  the  shapes  are  C2  con¬ 
tinuous  (i.e.  the  curvature  is  defined  everywhere).  We  do  not  want  to  assume,  as 
several  authors  do,  that  the  parts  are  cut  along  a  cross  section  or  that  a  cross  section 
is  visible.  Furthermore,  we  will  not  assume  the  existence  of  any  discontinuity  edges 
between  parts.  We  believe  that  the  case  of  parts  joined  discontinuously  is  the  limiting 
case  of  the  more  general  continuous  case  which  we  address. 

Given  the  3-D  surface  data,  either  from  a  CAD  model,  or  from  registered  range 
images  [2],  or  from  a  single  range  image,  we  first  recover  the  parabolic  curves  on  the 
surface.  This  requires  the  evaluation  of  the  sign  of  the  Gaussian  curvature  of  the  sur¬ 
face  patches.  It  has  been  shown  that  this  process  is  stable  and  reliable  [70,80,9].  The 
parabolic  curves  could  be  either  on  the  surface  of  the  individual  parts,  or  on  the  border 
of  the  “glue”  between  parts.  Note,  that  due  to  the  transversality  principle  [72],  there 
is  almost  always  an  anticlastic  (negative  Gaussian  curvature)  region  between  convex 
parts  when  they  are  joined.  The  parabolic  curves  on  the  parts  we  consider  could  be 
either  meridians  or  cross  sections  of  the  SHGC  and  PRCGC  parts  (this  has  been 
shown  for  SHGCs  [81]  and  we  have  proven  it  for  PRCGCs).  Using  simple  tests  we  can 
hypothesize  (or  in  many  cases  determine)  the  role  of  each  parabolic  curve.  We  can 
therefore  segment  the  object  into  parts,  and  based  on  the  properties  of  the  specific 
parts,  we  can  recover  the  axis  of  the  parts  from  the  meridians  and  cross  sections. 

One  problem  which  remains  is  that  some  parts  cannot  be  found  until  some  other 
parts  are  removed.  As  in  [52]  and  [54],  we  take  an  hierarchical  strategy,  in  which,  at 
each  step,  well  defined  parts  are  described  and  removed.  Once  these  parts  are  re¬ 
moved,  the  next  level  parts  can  now  be  described.  This  process  is  efficient  and  produc¬ 
es  a  decomposition  of  the  shape  into  its  intuitive  parts  with  a  stable  axial  description 
of  these  parts. 

2.2  Object  Recognition 

The  more  interesting  problem  is  to  recognize  3D  objects  from  grey  level  images. 
The  previous  methodology  becomes  very  inefficient,  as  the  number  of  generated  hy¬ 
potheses  increases  drastically.  We  propose  instead  to  generate  high  level  groupings 
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and  to  use  these  as  matching  primitives.  The  groupings  we  are  using  are  based  on 
parallel  and  skew  symmetry,  U-shapes  and  closures.  Furthermore,  we  show  how  to 
compute  these  groupings  efficiently  from  segments,  and  how  we  keep  the  number  of 
groupings  small.  We  have  obtained  encouraging  initial  results  on  real  images  [7]. 

As  an  application  of  our  matching  methodology,  we  study  the  “drop-off”  problem, 
in  which  an  observer  is  given  a  topographic  map,  and  is  dropped  off  at  an  unknown 
location.  We  select  as  matching  feature  the  panoramic  horizon  curve  (corresponding 
to  the  shy-ground  boundary  from  a  given  viewpoint).  The  polygonal  approximation  of 
this  curve  is  compared  with  precomputed  ones  using  our  hash  based  scheme  [56].  We 
have  obtained  accurate  results  from  real  data. 

We  have  defined  a  methodology  based  on  efficient  coding  and  hash  tables  to  rec¬ 
ognize  3D  objects  given  3D  data,  even  when  the  number  of  models  is  large  [57].  We 
have  performed  a  detailed  complexity  analysis  of  the  method,  which  results  in  O(n) 
<D(recognition)  <>  0(nm3),  where  n  is  the  number  of  matching  primitives  and  m  is  the 
number  of  models  in  the  database.  The  worst  case  occurs  when  the  models  hypothe¬ 
sized  to  be  in  the  scene  are  very  similar. 

2.3  Perceptual  Grouping 

Most  high  level  vision  algorithms,  such  as  shape  from  contour  [60]  or  line  draw¬ 
ing  interpretation  require  perfect  data  as  input,  but  it  is  impossible  to  generate  such 
features  with  low  level  algorithms  such  as  edge  detectors.  Here,  we  try  to  bridge  this 
gap  by  transforming  an  edge  image  into  a  saliency  map.  This  approach  uses  a  non¬ 
iterative  method  based  on  a  field  associated  with  each  edge.  This  field  encodes  the  no¬ 
tions  of  simplicity,  curvature  constancy  and  co-curvilinearity.  A  detailed  report  on  this 
effort  is  given  in  [15]. 

2.4  Shape  Analysis  from  Monocular  Images. 

In  this  project,  we  are  developing  techniques  for  inferring  3-d  shape  descriptions 
given  only  object  contours  We  have  developed  a  theory  that  can  infer  the  shape  of  a 
class  of  objects,  namely  zero-Gaussian  curvature  surfaces,  straight  homogeneous  gen¬ 
eralized  cylinders  and  planar,  right,  constant  cross-section  generalized  cylinders.  One 
of  the  recent  advances  here  is  extension  of  our  techniques  to  infer  shape  of  objects 
made  of  multiple  curved  surfaces.  We  have  also  started  to  develop  techniques  to  make 
our  theory  work  with  real  images  where  contours  are  likely  to  be  fragmented  and  dis¬ 
tracting  contours  such  as  markings  and  shadows  present.  We  report  on  these  efforts 
in  [66,  67]. 

We  have  continued  our  effort  in  understanding  how  to  infer  shape  from  monoc¬ 
ular  images  using  contours.  First  we  developed  a  theory  of  invariances  of  projected 
contours  and  how  they  can  be  used  to  infer  3-D  shapes  of  a  certain  classes  of  surfaces 
[61,  64,  65].  In  recent  work,  we  have  developed  a  system  for  generating  volumetric  3- 
D  shape  descriptions  from  real  images  containing  Straight,  Homogeneous  General- 
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ized  Cylinders  (SHGOs).  The  image  may  contain  multiple,  occluding  objects  and  the 
objects  may  have  surface  markings.  In  working  with  real  images,  we  must  deal  with 
problems  of  fragmented  boundaries  and  many  additional  boundaries  due  to  mark¬ 
ings,  shadows,  highlights  and  noise.  We  use  the  expected  properties  of  the  desired 
contours  to  separate  the  two  sets  of  properties  and  to  complete  the  broken  boundaries. 
Details  of  this  process  are  given  in  a  paper  in  the  recent  IU  Workshop  [66]. 

In  continuation  of  this  work,  we  are  also  studying  the  class  of  curved  generalized 
cylinders  with  circular  but  changing  cross-sections.  In  this  case,  we  are  unable  to  find 
invariants  for  the  visible  boundaries.  However,  we  are  able  to  find  good  quasi-invari¬ 
ants  that  show  that  commonly  used  ribbon  descriptions  are  in  fact,  stable  for  such  ob¬ 
jects.  Our  future  work  will  focus  on  compound  objects  that  combine  a  number  of 
primitives  that  we  have  analyzed  in  the  past. 


Final  Report 


7 


3  MOTION  ANALYSIS 

3.1  Spatio-Temporal  Analysis 

The  goal  of  our  work  in  spatio-temporal  analysis  is  to  generate  a  dense  optic  flow 
map  from  a  motion  sequence.  Because  of  the  sparseness  of  OD  features  (e.g.  corners) 
or  ID  features  (e.g.  curves),  we  feel  that  2D  features  (e.g.  regions)  are  more  likely  to 
produce  dense  motion  estimates.  Early  work  in  spatio-temporal  analysis  includes 
that  of  [71].  Our  work  began  with  [41,42],  with  the  extraction  of  paths  in  slices  taken 
in  the  temporal  direction  of  the  spatio-temporal  data  volume  (i.e.  paths  of  an  object 
point  through  time  and  space).  This  produces  an  image  velocity  estimate  only  along 
object  contours. 

In  order  to  generate  a  dense  displacement  field,  more  analysis  of  the  slice  data 
is  needed.  Strips  that  correspond  to  trapezoidal  regions  found  in  the  slices  through 
the  temporal  dimension  of  the  image  volume  are  constructed  for  selected  orientations 
throughout  the  image.  These  extracted  strips  provide  estimates  of  the  velocity  com¬ 
ponent  along  the  slice  orientation.  The  velocity  estimates  of  different  slice  orienta¬ 
tions  are  combined  to  compute  the  velocity  constraint  for  each  pixel.  A  voting  scheme 
is  used  to  extract  the  position  of  the  Focus  of  Expansion,  which  can  then  be  used  to 
compute  the  real  velocity  of  the  pixels. 

This  process  is  very  expensive  (requiring  hours  on  serial  machines),  but  most  of 
the  computation  is  easily  performed  on  the  SIMD  architecture  of  the  Connection  Ma¬ 
chine.  This  algorithm  was  transferred  to  a  CM- 2  with  very  good  results  for  computa¬ 
tional  speed-up.  Much  of  this  work  was  reported  on  in  previous  years  and  has  now 
been  completed  with  more  detail  in  [43]. 

3.2  Motion  Estimation 

We  have  continued  our  exploration  of  techniques  for  computing  structure  from 
motion  using  feature  matches  through  multiple  frames.  The  use  of  multiple  (as  op¬ 
posed  to  two)  frames  is  desirable  for  several  reasons: 

•  to  increase  the  robustness  of  the  solution, 

•  to  allow  recovery  of  structure/motion  with  fewer  features  being  tracked,  and 

•  to  allow  estimation  of  “higher  order  derivatives”  of  the  motion. 

We  have  completed  development  and  implementation  of  an  algorithm  for  the 
shape  from  motion  problem  given  point  feature  correspondences  and  perspective  pro¬ 
jection.  This  solution  works  for  a  class  of  motions  called  chronogeneous  motion,  which 
includes  uniform  acceleration  and  constant  angular  velocity  rotation  and  translation 
as  special  cases.  The  solution  is  by  an  iterative  algorithm  that  recovers  the  three-di- 
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mensional  motion  of  the  feature  points  and  the  three-dimensional  location  of  each  fea¬ 
ture  in  each  frame.  An  additional  closed  form  algorithm  that  recovers  motion  and 
structure  for  uniform  acceleration  is  used  to  generate  initial  guesses  for  the  iterative 
procedure  [10]. 

These  algorithms  are  discussed  further  in  [12]  with  additional  results,  or  in  the 
thesis  [11  ].  The  results  show  that  this  algorithm  performs  well  in  recovering  structure 
and  motion  parameters  from  feature  point  correspondences.  We  are  using  this  motion 
estimation  technique  in  our  other  motion  work  [24]. 

3.3  Integrated  Motion  System 

Accurate  motion  estimation  in  feature-based  analysis  of  an  image  sequence  re¬ 
quires  consistent  feature  extraction  and  reliable  matching.  Without  a  priori  informa¬ 
tion,  inconsistent  feature  extraction  and  erroneous  matching  are  hard  to  detect  and 
are  often  closely  related.  To  address  these  problems,  we  developed  an  integrated  fea¬ 
ture-based  (including  both  image  regions  and  corner  features)  system  that  uses  error¬ 
ful  data  for  motion  estimation  and  overcomes  these  errors  with  feedback  that 
improves  both  matching  and  feature  extraction.  The  system  acquires  the  initial  esti¬ 
mates  using  a  batch  mode  analysis  (as  in  [11])  and  continues  the  processing  using  an 
incremental  analysis  of  the  input  sequence.  Thus  we  use  three  dimensional  motion  es¬ 
timation  as  an  aid  in  generating  the  data  necessary  for  the  motion  estimation  system 
itself.  The  program  refines  the  initial  noisy  correspondence  data  by  removing  those 
parts  that  do  not  fit  the  estimated  3-D  motion  parameters.  The  motion  parameters  are 
in  turn  refined  using  the  improved  correspondence  data.  This  process  is  iterated  until 
a  consistent  feature  set  is  obtained  (usually  two  to  three  times).  For  improved  feature 
extraction,  properties  obtained  from  the  corresponding  object  in  other  frames  guide 
the  segmentation  process  to  improve  the  extracted  region  or  add  missing  regions  to 
the  sequence  of  tracked  features.  By  using  the  regions  which  underlie  the  corners,  we 
produce  a  rough  reconstruction  of  surfaces  in  the  environment  from  the  sparse  depth 
information  at  the  comers. 

This  system  used  several  existing  programs  for  feature  extraction  and  matching 
and  for  estimating  three  dimensional  motion  and  structure  from  a  set  of  matching 
points.  These  systems  have  limitations  and  produce  errors  such  as  missed  or  extra 
features,  missed  or  incorrect  matches.  By  incorporating  these  different  programs  into 
a  single  system  with  feedback,  we  have  substantially  reduced  the  impact  of  the  errors 
and  improved  the  final  results  of  the  analysis.  In  the  papers  and  thesis  on  this  work, 
we  present  the  results  on  standard  real  image  sequences,  which  are  a  subset  of  the 
data  the  program  has  been  tested  on.  More  details  are  presented  in  [24,26,27] 

3.4  Mobile  Platform 

We  have  continued  with  our  robot  project  using  trinocular  imagery  for  guidance. 
We  are  investigating  robot  navigation  for  situations  where  only  generic  maps  are 
available,  with  one  of  the  tasks  being  the  generation  of  more  complete  maps.  The  vi- 
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sual  navigation  uses  three  views  to  improve  the  performance  of  the  stereo  system, 
both  in  speed  and  accuracy  of  the  matching.  Rather  than  producing  a  complete  depth 
map  we  are  concerned  only  with  producing  a  "squeezed  3-D  map”  that  shows  corridor 
walls  and  obstacles  in  the  hallway.  This  work  is  discussed  in  more  detail  in  [23] 
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4  PARALLEL  PROCESSING 

4.1  Algorithm  Implementation  on  Existing  Machines 

We  have  studied  parallel  implementations  of  several  high-level  algorithms,  such 
as  relaxation  labelling  and  graph  matching.  Our  recent  work  has  looked  at  the  prob¬ 
lem  of  geometric  hashing,  which  is  used  for  a  variety  of  matching  problems  [58].  In 
earlier  parallel  implementations  the  number  of  processors  was  independent  of  the 
size  of  the  scene  but  depended  on  the  size  of  the  model  database.  In  this  work  we  have 
designed  new  parallel  algorithms  for  both  the  MasPar  and  Connection  Machine  archi¬ 
tectures  which  improve  on  the  number  of  processors  and  improve  the  overall  perfor¬ 
mance.  Details  of  this  work  are  given  in  [22].  A  summary  of  our  recent  work  is 
outlined  below. 

4.L1  Stereo  and  Image  Matching 

Stereo  matching  is  one  of  the  well  known  methods  for  extraction  of  depth  infor¬ 
mation.  Depth  recovery  is  a  crucial  problem  in  image  understanding  with  applica¬ 
tions  in  robotics  and  navigation.  For  stereo  matching,  we  have  proposed  0(Nn3/P ) 
time  algorithm  on  a  P  processor  fixed  size  linear  array,  where  N  is  the  number  of  line 
segments  in  one  image,  n  is  the  number  of  line  segments  in  a  window  determined  by 
the  object  size,  and  P  <.  n  [21].  This  algorithm  is  a  parallel  implementation  of  the  ste¬ 
reo  matching  algorithm  proposed  by  Medioni  and  Nevatia  in  [31]. 

Discrete  relaxation  techniques  have  been  widely  used  in  computer  vision  and  ar¬ 
tificial  intelligence.  For  the  image  matching  problem,  discrete  relaxation  technique 
outlined  in  [30]  leads  to  a  sequential  execution  time  of  0(n3m3)  for  labelling  n  objects 
with  m  labels.  In  [29]  we  have  proposed  a  faster  sequential  algorithm  for  image 
matching  which  runs  in  0(n2m2)  time,  where  n  is  the  number  of  line  segments  in  the 
image  and  m  is  the  number  of  line  segments  in  the  model.  Also,  a  partitioned  parallel 
implementation  has  been  developed  by  using  the  proposed  sequential  algorithm. 
0((nm  / P+P)nm )  time  performance  is  achieved  on  aP  processor  fixed  size  linear  array, 
where  P<nm. 

4.L2  Sorting  on  Reconfigurable  Mesh 

The  Reconfigurable  Mesh  forms  the  CAAPP  level  of  Image  Understanding  Ar¬ 
chitecture  (IUA)  [83].  An  optimal  sorting  algorithm  on  the  Reconfigurable  Mesh  is  de¬ 
rived  in  [20].  The  algorithm  sorts  n  numbers  in  constant  time  using  nxn  processors. 
The  best  known  previous  result  uses  0(/ixnlog2n)  processors.  Our  algorithm  satisfies 
the  AT2  lower  bound  of  Q(n2)  for  sorting  n  numbers  in  the  word  model  of  VLSI.  Mod¬ 
ification  to  the  algorithm  for  area-time  trade-off  is  shown,  to  achieve  the  AT2  lower 
bound  over  1  <,T<An.  Previously,  the  lower  bound  was  achieved  over  log  n<T<An.  Notice 


Final  Report 


11 


that,  using  sort  a:,  a  basic  procedure  number  of  low-  and  intermediate-level  Image  Un¬ 
derstanding  problems  can  be  solved  on  the  IUA. 

4.L3  Graph  Algorithms 

Many  of  the  intermediate-level  computer  vision  tasks  can  be  posed  as  graph 
problems.  Particularly,  digitized  picture  graphs  (DPGs)  of  two  and  three  dimensions 
are  of  primary  importance  due  to  their  natural  correspondence  with  black/white  im¬ 
ages.  We  introduce  a  notion  of  partitionability  of  graphs  and  show  that  DPGs  (of  any 
fixed  dimension)  are  partitionable  [45].  This  partitionability  property  helps  in  con¬ 
structing  efficient  parallel  algorithms  for  many  problems  on  digitized  picture  graphs. 
We  show  that  our  techniques  can  be  efficiently  simulated  on  a  PxP,  fixed-size  mesh- 
connected  computer,  l£P£n.  Unlike  other  approaches  [78],  our  algorithms,  because  of 
the  partitionability  idea,  easily  extend  to  problems  in  higher  dimensions. 

4.1.4  VLSI  Architectures  for  Image  Transforms  and  Vector  Quantization 

We  have  studied  VLSI  architectures  for  various  image  transforms  and  vector 
quantization  techniques.  Two  linear  array  architectures  have  been  proposed  for  com¬ 
puting  the  arithmetic  Fourier  transform  and  image  compression  using  vector  quanti¬ 
zation  [35,36].  These  architectures  have  modular  PEs  and  can  support  real-time 
processing.  The  designs  can  operate  with  less  number  of  PEs  than  the  input  size.  The 
proposed  designs  require  fixed  I/O  bandwidth  with  the  host. 

4.2  Parallelization  of  Symbolic  Techniques  in  Vision 

There  is  relatively  little  work  done  in  parallelizing  high  level  vision  algorithms. 
Such  algorithms  are  usually  symbolic  in  nature  and  the  processing  is  not  entirely  lo¬ 
cal.  We  believe  that  when  dealing  with  such  complex  algorithms,  the  parallel  imple¬ 
mentation  must  be  concerned  with  the  following  four  characteristics:  algorithm 
speedup,  processor  efficiency,  system  complexity,  and  programmer  burden. 

Most  research  in  parallel  processing  has  been  concerned  solely  with  the  first  two 
characteristics.  We  have  been  pursuing  an  alternative  that  achieves  a  better  balance 
between  the  desired  characteristics.  In  our  approach  we  classify  algorithms,  in  terms 
of  operations,  data  dependencies,  data  movements,  and  algorithm  characteristics, 
and  then  specify  a  parallel  processor  architecture  that  is  well  suited  to  those  charac¬ 
teristics  [51]. 

We  have  applied  this  methodology  to  a  number  of  mid  and  high  level  vision  al¬ 
gorithms.  Our  first  experience  was  with  an  algorithm  for  image  matching  via  relax¬ 
ation  labelling  with  symbolic  objects  and  geometric  constraints[30].  Our  analysis 
indicated  that  the  use  of  an  MIMD  architecture  that  comprises  powerful  processing 
elements  programmed  with  the  loosely  synchronous  protocol.  A  suitable  interconnect 
topology  is  one  of  logarithmic  diameter.  Two  implementations  were  developed,  one  us¬ 
ing  binary  tree  and  the  other  using  hypercube  connections.  This  scheme  exploits  the 
coarse  grain  parallelism  within  the  algorithm.  Further  analysis  shows  that  equipping 
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each  PE  with  a  tightly  coupled  vector  processor  would  exploit  the  fine  grain  parallel¬ 
ism  within  the  algorithm.  This  architecture  achieves  high  degrees  of  speedup  and  ef¬ 
ficiency  while  using  software  that  is  nearly  identical  to  that  of  the  serial 
implementation,  thus  system  complexity  and  programmer  burden  are  minimized.  De¬ 
tails  of  this  study  were  previously  reported  in  [49]. 

In  more  recent  work,  we  have  studied  an  algorithm  for  object  recognition  that 
uses  graph  matching.  Our  analysis  again  indicates  that  an  MIMD  architecture  with 
powerful  processing  elements  is  suited  to  this  problem.  The  PEs  are  connected  by  a 
hypercube  topology.  Again,  we  achieve  significant  algorithm  speedup  and  processor 
efficiency  while  using  software  that  is  nearly  identical  to  that  of  the  serial  implemen¬ 
tation. 

Lastly,  we  have  studied  the  mid-level  operations  of  linear  feature  extraction  and 
perceptual  organization.  These  operations  may  appear  to  be  simple  and  repetitive, 
and  thus  well  suited  to  SIMD  implementations.  However,  this  is  not  the  case.  Typi¬ 
cally,  parallel  implementations  only  focus  on  the  study  of  a  specific  algorithm.  They 
assume  that  the  input  is  given  in  the  desired  form  and  the  output  is  produced  in  some 
form.  In  a  system  (or  sub-system)  that  comprises  of  a  number  of  processing  steps,  con¬ 
version  of  the  output  of  one  stage  to  another  itself  can  be  a  major  step,  possibly  requir¬ 
ing  serial  implementation.  A  simple  example  is  that  of  linear  feature  extraction  where 
finding  edges  and  then  their  neighbors  that  would  form  curves  is  an  iconic  process 
that  is  easily  implemented  on  a  SIMD  machine.  However,  this  is  different  from  actu¬ 
ally  producing  a  list  of  curves,  each  curve  given  by  a  list  of  points  forming  it,  in  order, 
and  possibly  a  linear  approximation  to  it  as  well.  This  is  the  structure  needed  for  sub¬ 
sequent  use  of  the  linear  feature  processing.  Our  proposed  implementation  is  de¬ 
scribed  in  detail  in  [50]. 

In  conclusion,  our  research  has  shown  that  the  complex  operations  and  data 
movements  required  by  mid  and  high  level  vision  can  be  performed  efficiently  if  care 
is  taken  in  specifying  the  parallel  processor  architecture.  Implementations  that 
achieve  high  degrees  of  algorithm  speedup  and  processor  efficiency  can  be  attained 
without  sacrificing  system  complexity  and  programmer  burden.  Such  architectures 
can  be  realized  utilizing  heterogeneous  designs  or  via  reconfigurable  architectures 
given  an  efficient  reconfiguration  procedure.  We  have  certainly  not  studied  all  the  al¬ 
gorithms  used  in  vision,  but  believe  that  our  choice  of  selected  algorithms  covers 
enough  of  a  span  to  indicate  that  it  is  fruitful  to  further  pursue  this  approach.  We  also 
believe  that  the  next  step  is  to  investigate  the  parallel  implementation  of  complete, 
heterogeneous  vision  systems  that  comprise  low,  mid,  and  high-level  algorithms. 
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